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Abstract-Random coding, expurgated and sphere packing bounds are derived by method of 
types and method of graph decomposition for incapacity of discrete memoryless channel 
(DMC). Three decoding rules are considered, the random coding bound is attainable by 
each of the three rules, but the expurgated bound is achievable only by maximum-likelihood 
decoding. Sphere packing bound is obtained by very simple combinatorial reasonings of 
the method of types. The paper joins and reviews the results of previous hard achievable 
publications. 
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Index Terms-Capacity, incapacity, error probability bounds, rate-reliability function, 
method of types, method of graph decomposition, decoding rules. 



I. Introduction 



Let X, y be finite sets and W = {W(y\x),x G X, y G y} be a stochastic matrix. 
^ \ Definition 1: A discrete channel W with input alphabet X and output alphabet y is 

defined by stochastic matrix of transition probabilities W : X — > y . An element W(y\x) of 
the matrix is a conditional probability of receiving the symbol y G y on the channel's output 
if the symbol x G X is transmitted from the input. 

The model for N actions of the channel W is described by the stochastic matrix 



w N : x N - y N 

the element of which VT JV (y|x) is a conditional probability of receiving vector y G y , 
when x G X N is transmitted. Here we consider memoryless channels, which operate at each 
moment of time independently of the previous or next transmitted or received symbols, so 
for all x G X N and y G y N 

N 

W N {y\x) = 1] W(y n \x n ). 

n=l 

Let M. denotes the set of messages to be transmitted and M - the number of messages. 

Definition 2: A code (/, g) for the channel W is a pair of mappings, where / : A4 — > X N 
is encoding and g : 3^ — ► M- is decoding. iV is called the code length, and M is called the 
code volume. 



* The author is with the Institute for Informatics and Automation Problems of the Armenian National 
Academy of Sciences, 1 P. Sevak str., Yerevan 0014, Armenia. E-mail: evhar@ipia.sci. am. 
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Definition 3: The probability of erroneous transmission of the message m G Ai by the 
channel using code (/, g) is defined as 

e(m, /, g, N, W) ± W N (y N - g-\m)\f(m)) = (1) 

= l-W N {g-\m)\f{m)). 
We shall consider the maximal probability of error of the code (f,g): 

e(f,g,N,W) = maxe(m,f,g,N,W), 

and the optimal maximal probability of error for the channel W: 

e(M,N,W) = mme(f,g,N,W), 

(f,9) 

where minimum is taken among codes (/, g) of volume M, and the average probability of 
error for equiprobable messages is 

v(f,g,N,W)±±- £ e(m), 

with e(M, N, W) as the minimal average probability among all possible codes of the length 
iV and the volume M. It is clear that always 

e(f,g,N,W)<e(f,g,N,W). 

Definition 4-' The transmission rate of a code (f,g) of volume M is 

R(f,g,N)^±\ogM. (2) 
Note that in this paper all exp-s and log-s are to the base two. 

We consider the codes, error probability of which exponentially decrease with exponent 
E, when N — > oo: 

e(f,g,N,W)=exp{-NE}. (3) 

Denote the best volume of the code of length iV for channel W satisfying the condition 
(3) for given reliability E > by M(E, N, W). 

Definition 5: The rate-reliability function, which by analogy with the capacity we call 
incapacity, is for maximal probability of error 



R(E, W) = C(E, W) = lim - log M(E, N, W), 

and R(E, W) for the case of average probability of error. As in the case of capacity, in- 
capacity is called maximal or average and denoted, correspondingly, C(E, W) or C(E, W) 
depending on which error probability is considered in (3). 

The concept of -E-capacity was first considered by the author in [1], where derivation 
of the upper bound R sp (E, W) was stated. The simple combinatorial proof of R sp (E, W) 
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was obtained in [2]. In Section 4 for completeness we present it, because the paper [2] is 
little-known. 

Alternative methods for the existence part of coding theorems demonstration are Shan- 
non's random coding and Wolfowitz's maximal code methods. In [3] Csiszar and Korner 
introduced a new original method, based on the lemma of Lovasz on graph decomposition. 
Different methods of error exponent investigation were presented in [4] - [13] and in many 
other works. Here we shall derive upper bounds for R(E, W) using the method of graph 
decomposition. 



II. Formulation of Results 

In the beginning we remind our notations for necessary characteristics of Shannon's 
entropy and mutual information and Kullback-Leibler's divergence. 
The size of the set X is denoted by \X\. Let P be a PD of RV X 

P = {P(x),x e X}, 

V be a conditional PD of RV Y for given value x of RV Y 

V = {V(y\x),xeX,yey}. 
The joint PD of RVs X and Y is 

P o V = {P o V(x, y) = P(x)V(y\x),x E X,y e y}, 

and PD of RV Y is 

PV = {PV(y) = J2 P(x)V(y\x),ye y}. 
Sometimes we need to consider a stocastic matrix V : X — > X of conditional probabilities 

V = {V(x\x),x ex,ze X}. 

We use the following notations: for entropy of RV X with PD P: 

H P {X)^-Y J P{x) logP(x), 

for joint entropy of RVs X and Y: 

H Py (X,Y) = - ]T P(x)V(y\x)\ogP(x)V(y\x), 

for conditional entropy of RV Y relative to RV X: 

H Py {Y\X)^- ]T P(x)V(y\x)\ogV(y\x), 



for mutual information of RVs X and Y: 



//'.ri.YA V) ' J2 P(x)V(y\x)\og Vi!jlr) 



xex,yey 



PV(y) ' 
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for informational divergence of PD P and PD Q on X: 



A _ . P(x) 



D(P\\Q) £ Z P(x)*>g- , 
and for informational conditional divergence of PD P oV and PD P o W on X x y-. 

D(V\\W\P)± £ P{x)V{y\x)\og V[,l[v) 



The following identities are often useful 

D(P o V\\Q o W) = D{P\\Q) + D{V\\W\P), 

H Py (X, Y) = H P {X) + H P y(Y\X) = H PV (Y) + H P y(X\Y), 
I Py {X AY) — H PV (Y) - H Py {Y\X) = H P {X) + H PV (Y) - H P y(X, Y). 

The proofs in this paper will be based on the method of types [9]. The type P of a 
sequence (or vector) x = (x±, . . . , xn) G X n is a PD P = {P(x) = N(x\xl)/N, x e X}, 
where N(x\x) is the number of repetitions of symbol x among x 1 , . . . ,x^. The joint type 
of x and y G y N is the PD P = {P(x,y) = N(x,y\x.,y)/N, x e X, y e y}, where 
N(x, y|x, y) is the number of occurrences of symbols pair (x, y) in the pair of vectors (x, y). 
We say that the conditional type of y for given x is PD V = {V(y\x), x £ X , y £ y} if 
N(x, y|x, y) = N(x\x)V(y\x) for all x G X, y e y. 

The set of all PD on X is denoted by V(X) and the subset of V(X) consisting of the 
possible types of sequences x G X N is denoted by Vn{X). The set of vectors x of type P is 
denoted by Tg(X) and T£(X) = for PD P G P(^) - P^(A'). The set of all sequences 
y G y N of conditional type V for given x G Tp{X) is denoted by 7p^(F|x) and called 
^-shell of x. The set of all possible \/-shells for x of type P is denoted Vjv(3\ P)- 

In the following lemmas very useful properties of types are formulated, for proofs see [9]. 

Lemma 1: (Type counting) 

\V N (X)\ < (N + 1)W (4) 

\V N (y,P)\<(N + l)WW (5) 
Lemma 2: For any type P G V^{X) 

(N + exp{NH P {X)} < \T* '{X)\ < exp{NH P {X)}, (6) 

and for any conditional type V and x G Tp(X) 

(N + 1)-!^' eMNH P y(Y\X)} < \T» v (Y\x)\ < exp{OT Py (F|X)}. (7) 

Lemma 3: If x G Tp(X), then for every PD Q on X 

N 

Q w (x) = J] QM = eM-N(H P (X) + D(P\\Q))}. (8) 

71=1 
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If y G 7py(y|x), then for every conditional PD V on y for given x 

W^(y|x) = exp{-N(H Py (Y\X) + D(V\\W\P))}. (9) 

Consider the random coding exponent function Rr(E, W), which is a lower estimate for 
C(E,W) = R(E,W) 

R r (P,E,W) = min I I PV (X A Y) + D(V\\W\P) - E\ + , 

V:D(V|jW|P)<£ ' 

i2r(£,W) = m&xRr(P,E,W). (10) 

The expurgated exponent function R X (E, W), which is another lower estimate for R{E, W) 
is defined using probability matrix V : X — > X 

R X (P, E, W) = mm{I p y(X A X) + |E p ^ B (X, X) - 
here oIb(x,x) is the Bhattacharyya distance 



d B ( X , x) = - log x: v^ww, (ii) 



and 



R X (E, W) = maxR x (P,E,W). 



Sphere packing exponent function serves an upper bound of R(E, W). 

K SP (P,E,W)= min I PV (X AY), (12) 

PK ' ' 1 V:D(V\\W\P)<E ' Vy " V ' 

R sp {E, W) = maxR sp {P, E, W). 

It was first considered in [1]. 

Theorem 1: For DMC W and for any E > the following bound holds 

R(E, W) > max(Rr(E, W), R X (E, W)). 

Theorem 2: For DMC W for E > the following inequalities hold 

R{E, W) < R(E, W) < R sp (E, W). 

Theorem 3: For < E < E cr {P, W), where 

e^,w)^{e, 9 J^M^1 >-l}, 

the estimates are equal each other and give -E-capacity: 

R(P, E, W) = R sp (P, E, W) = R r (P, E, W). 

For 

E cr (W) = min Ie : ^ > -l| = maxE cr (P, W) 



the equality holds: 

R(E, W) = R sp (E, W) = R r (E, W). 

Remark 1: For E — > 

lim R sp (P, E, W) = lim R r (P, E, W) = I P , W {X A Y), 

E- — >0 E — s-0 

R sp (0,W) =R r (0,W) =C(W). 



III. Proof of Theorem 1 

Lemma 4-' Consider a finite set A and a nonnegative function v on A x A such that for 
every a,b E A 

v(a, b) = v(b, a), v(a, a) = 0. 

If for some t, for each a E A 

53 v{a,b) < t, 

and ti,t2, ■ ■ ■ ,ts are nonnegative numbers such that 

s 

then ^4 can be divided into 5 disjoint subsets Ai, . . . ,As such that for every a G A s 

£ i^(a,6) < t s , s = 1,S. 
beAs 

For proof of the Lemma see [3]. There lower bounds for reliability function E(R, W) of 
DMC and of sources with side information were obtained using Lemma 4. We now present 
similar derivation of random coding and expurgated bounds for E-capacity R(E, W) of 
DMC. 

Theorem 1 formulated above is a consequence of the following 

Theorem 4: For DMC W : X -> y, any E > 0, 5 > and type P G V N {X) for 
sufficiently large iV codes (/, g) exist such that 

e(f,g,N,W)=exp{-N(E + 5)} (13) 

and 

R(P, /, 0, TV) > max(i? r (P, P + 5, W), R X (P, E + 6,W)). 

The proof of Theorem 4 consists of several steps. First we shall prove 

Lemma 5: For given type P G Vn{X), for any < r < |7^(X)| such set C exists, that 

C C Tp (X), \C\ > r and for any x G C and matrix V : X ^> X different from the identity 

matrix the following inequality holds 

\T» v (X\5t)f)C\ < r\T? v (X\x)\exp{-N(H P (X) - 5 N )}, (14) 
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where 

5 N = N'^X] 2 + \X\)\og(N + 1) + 1]. 
Proof: Using Lemma 4 let us assume A = T${X) and 

a \ \T»(X\yi)\-\ ifx^xandxeT" (X|x), 



V X, X 



P,V ^ I / I > ~~~~~^- p y K 

0, if x = x. 



Because x and x are of the same type P, when x e T^ 7 (X|x), then x e T^,(X|x) where 

V"' is such that the matrix with elements P(x)V(x\x) is transposed to the matrix with 
elements P(x)V (x\x), x, x G X. Here NP(x)V(x\x) = N(x, x|x, x) and NP(x)V = 
iV(5, x|x, x) define correspondingly joint types of the pairs (x, x) and (x, x). We have that 

ir F ™=n (Arp(i))! 



n(iV(x,x|X,x))! 



and 

i W x)i= n - ™ ! 



•pyv-rvi i_in(jv(a:,x|x,x))!' 
the right sides are equal, so we see that z/(x, x) = z/(x, x). We have also from (5) 

E E Kx,x) = E E Kx,x)<(tv + i)W 2 . 

xer/(x) xer/(x) y xerf_(x|x) 

If we take t = (N + l)'*' 2 , t s = t/ S, s = 1,S, then according to Lemma 4 there exists a 
partition of Tp(X) into subsets ^4 S , s = 1, S, such that for each x from A s 

\T» v (x\5t) n A| < -^(X|X)|(iV + 1)^ 2 , S = T^. (15) 

Taking C equal to greatest ^4 S and S equal to integer part of \Tp (X)\/r we receive 
\C\ > S^ 1 \Tp(X)\ > r, and inequality (14), which follows from (15) and (6), because 

1 11 2r 



S l\T P N (X)\/r\ ~ \T P N (X)\/2r \T» {X)[ 
Lemma 5 is proved. 

For code existence theorems demonstration it is possible to consider various "good" 
decoding rules. For definition of those rules following [3], we apply different real- valued 
functions a defined on X N x y N . One says that g a decoding is used if to each y from 3^ 
on the output of the channel the message m is accepted when codeword x(m) minimizes 
a(x(m),y). One uses such functions a which depend only on type P of x and conditional 
type V of y for given x. Such functions a can be written in the form a(P, V) and at 
respective decoding 

the message m' corresponds to the vector y, if 

(m',V) = aig min a(P,V). 

( m yy. yerp v (Y\f(m)) 
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Here V = {V(y\x),x E X,y E y} is a matrix different from V but guaranteeing that 

J2 P(x)V(y\x) = J2 P(x)V(y\x), yey, 

or equivalently, PV = PV. 

Previously the following two decoding rules were used [3]: maximum-likelihood de- 
coding, when the accepted codeword x(m) maximizes the transition probability W N (y\x.(m)), 
in this case according to (9) 

a(P, V) = D(V\\W\P) + H Py (Y\X), (16) 

and the second decoding rule, called minimum-entropy decoding, according to which the 
codeword x(m) minimizing Hpy(Y\X) is accepted, that is 

a(P,V)=H P , v (Y\X). (17) 

In [11] and [12] it was proposed another decoding rule by minimization of 

a(P,V) = D(V\\W\P), (18) 

which can be called minimum - divergence decoding. 

Let V = {V(y\x, x), x G X,x G X,y G y}, be a conditional distribution of Y given 
values of X and X such, that 

P{x)V(x\x)V(y\x, x) = P(x)V(y\x), xeX,yey, (19) 

X 

P(x)V'(x\x)V(y\x, x) = P(x)V(y\x), x eX,yey. (20) 

X 

Using the notation from [3] we write 

V -< a V if a(P, V) < a(P, V) and PV = PV. 

Let us denote 

R a (P,E,W) =_^^ min {I p y(X AX) + \I ^(Y AX\X)+D(V\\W\P)-E\ + }, 

V y y:V< a V,V:D{V\\W\P)<E ' ' 

(21) 

where RV X, X, Y have values, correspondingly, on X, X, y such that the following is valid: 
both X and X have distribution P and PV = P, 

V is the conditional distribution of RV Y given X and X satisfying (19) and (20). 
Minimization in (21) (and later) is understud by variables ordered from right to left. In 
(21) min must be taken by V under condition D(V||W|P) < E, then by V under condition 

V -< a V, by V under conditions (19), (20) and at last by V . 

The main portion of Theorem 4 demonstration is contained in 

Proposition 1: For any DMC W, any type P G V N (X), any E > 0, 5 N > 0, for all 
sufficiently large N codes (/, g a ) exist such, that 

e(f, g a , N, W) = exp{-N(E + 5 N /2)}, (22) 
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and the rate is large enough: 

R(P, f, g a , N) > R a (P, E + 5 N , W). (23) 

Proof: For some R let us write r = exp{N(R — 5^)}. According to Lemma 5 for 
r < \Tp (X)\ a collection C C T P (X) exists such that \C\ > r and for each x G C and any 
probability matrix V : X — > A? different from the identity matrix, for A" large enough 

|T^(X|x) f|C| < exp{iV( J R - J pF (X A X))}. (24) 

Remark that from simmetry we have 

\T^(X\^f]C\ < exp{N(R — I p y{X A X))} = 

= exp{N(R- I pV (X AX))}. (25) 

Let us take C as a set of codewords of the code (/, g a ). If x G 7^(X|x) exists such that 
x G C and x G C, then 

R(P,f,g a ,N)>I PtV (XAX). (26) 

As in (1) 

e(m, /, ^, N, W) = W N (y N - g-\m)\f(m)). 

In accordance with g Q -decoding the set y N — g~ 1 (m) contains all words y for which codevector 

f(m') exists different from f(m), such that, if f(m) G T p y(X\f(m')) and 

y G TF y (Y\f(m))[\Tj?~{Y\f{m')), then the joint type of the triple (/(to), /(to'), y) is 

P o V o V", with \/ meeting conditions (19), (20) and V -< a V. 

Denote (y N — g^im)) f\T N _~(Y\f(m), /(to')) the set of such vectors y, for which the 

triple (f(m), /(to'), y) has that joint type PoVoV for some m' different from to. According 
to (5) the number of such types does not exceed (N + l)!*! 2 ^!. Taking into account that 

T"~(Y\fMJ(rn')) C T P N v (Y\f(m)), (27) 

and then using (9) we have 

W N {y N -g-\m)\f{m)) < (N + l)^x 

x_^max £ l(^- 1 (TO))n^ ~(F|/(TO),/(m'))|^(y|/(TO)) = 

y.y^y^yy/^/Me^-W/K))^ ' ' 

= (7V+l)™_„max £ l(^-^(TO))n^ ,(r|/(m),/(m'))|x 

V,V,V:y-<ay,y/(m'):/(m)€r^L(X|/(m'))n c ' ' 

x exp{-7V(D(y||^|P) + H Py (Y\X))}. (28) 
Granting (2), (7) and (24) we bound 

E \(y N -9a\rn))f]T^(Y\f(m),f(m'))\ < 

f{m>):f{m)eT» {X\f(m>))f\C ' ' 
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< ex P {N(R(P, /, g a , N) - I py (X A X))} exp{NH ~(Y\X, X)} = 

= exp{N(H P y(Y\X) - I py ~(X, Y A X ) + R(P, f, g w N))}. (29) 
Since (27) is valid, from (7) we have 

E 10^ " 9a\m))f]T^(Y\f(m), f(m>))\ < ex P {NH Py (Y\X)}. 

f(m'):f(m)eT» y (X\f(m>))f)C ' ' 

With (29) it gives us 

E \(y N -9a\m))f]T^ _dY\f(m),f(m'))\ < 

f(m'):f(m)eT^ y (X\f(m'))C\C ' ' 

< exp{N(H P y(Y\X) - \I p -~(X, Y AX) — R(P, /, g a , N)\ + )}. (30) 
From (28) and (30) we obtain 

e(f,g a ,N,W) = u^W N {y N - g-\m)\f{m)) < 

<(N+ l)^ _ = max exp{-N(D(V\\W\P) + \I ~(X, Y AX) — R(P, f, g a , N)\ + )}. 
vyy-.v< a vy ' ' 

If the equality in (22) is in force, then for sufficiently large N, we have 

> _ = mm (D(V\\W\P) + \I y ~(X, Y AX) - R(P, /, g a , N)\ + - E - 5 N ). (31) 
vyy-.v^vy ' ' 

Suppose that this minimum is obtained on V with V and Vo, that is 

> D(V \\W\P) -E-5 N + \I f (X, Y AX) — R(P, /, g a , N)\ + . 

py o,vo 

Because \I _ ~ (X, Y A X) — R{P, /, g a , N)\ + > 0, we have 
py o,vo 

0> . min (D(V\\W\P) - E - 5 N + 

Vyy:V< a Vy:D(V\\W\P)<E+8 N -\I _ ~ (X,YAX)-R(PJ,g a ,N)\ + 

P,Vq,V 

+ \I py ~(X, Y AX) — R(P, f, g a , N)\ + ) > 

> _ = _ min (D(V\\W\P) -E-5 N + \I Y AX) — R(P, f, g a , N)\ + ). 

vyy:V< a vy-.D(v\\w\p)<E+s N ' ' 

Since 

|/ _ f/ (X, Y AX) — R(P, f, g a , A0|+ > I _ f (X, Y AX) — R(P, /, g m N), 
then we deduce that 

R(P,f,g a ,N)>_^_ min (Z>(V ||W|P) + 7 ~(X, F AX)). (32) 

y,y,y:y^c«v,v r :-D(y||iy|P)<£;+5 J v 
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From (32), (21) and (26) we obtain (23). Proposition 1 is proved. 

Remark 2: The functions R a (P, E, W) and R X {P, E, W) depend on E continuously. 
Lemma 6: Let us introduce the following functions 

R a , r (P,E,W) = min \I p ^(Y A X) + D(V\\W\P) - E\+, (33) 



V:V< a V,V:D(V\\W\P)<E 



R a , x (P, E, W)= _„min {I P y(X A X) + \I -~(Y A X\X) + D(V\\W\P) - E\ + }. (34) 

vy,V:V< a vy ' ' ' 

Then for all P and E > 

R a (P, E, W) > ma,x[R a>x (P, E, W), R a , r (P, E, W)]. 
Proof: The inequality 

R a (P,E,W)>R a , x (P,E,W) 
follows from definitions (21) and (34). For the proof of the inequality 

R a (P,E,W)>R a>r (P,E,W) 

remark that 

I py (X A X) + J_ = (y AX\X) = I py ~(XY AX)> I p ^(Y A X) 

and then compare (21) and (33) using inequality \a + b\ + < \a\ + + \b\ + , which is valid for 
any real a and b. 

Lemma 7: A point E^(P, W) exists, such that 



m a x[R a , x (P, E, W), R a , r (P, E, W)\ = 



R a , r (P,E,W), when E < E* a (P,W) 
R^ X (P,E,W), when E > E* a (P,W). 



Proof: Note that functions R a ^ r {P, E, W) and R a ^ x (P,E,W) are nonnegative and de- 
creasing by E. Let us first prove that for any E > E' > 

R a , x (P, E', W) < R a , x (P, E,W) + E- E' . (35) 

In accordance with (34), bearing in mind the inequality |a + 6| + < |a| + + \b\ + , we obtain 

R a:X (P,E',W) = _min {I p -y(XAX) + \I_~(YAX\X)+D(V\\W\P)-E'+E-E\ + }< 
v,v,v-.v-< a v,v ' yy 

< R a , x (P,E,W) + E-E'. 

Denote by E^ r (P, W) the least value of E, for which R atr (E, P, W) = 0. Let us show that 
for any E and E', such that 

< E' < E < E° ar (P,W), 

the inequality 

R a , r (P, E,W) + E-E'< R a , r (P, E', W) (36) 
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holds. Really, in the interval [0, E® (P, W)) function R atr (E, P, W) is strictly positive, then 
for such E and E' 

R a , r (P,E',W) = min (I ~Y A X) + D{V\\W\P) - E) + E - E' > 

V:V-< a V,V:D(V\\W\P)<E' 

min {I P y{Y A X) + D(V\\W\P) - E) + E - E' = 

V:V-< a V,V:D{V\\W\P)<E 

= R a , r (P,E,W) + E-E'. 
Denote by E*(P, W) the smallest E, for which 

R a , r (P,E,W)<R a , x (P,E,W). 

Let us show that this inequality holds for all E greater than E^(P, W). Consider two cases. 

If < E*(P, W) < E Q ar (P, W), then it follows from (33), (34), (35) and (36) that for all 
E from interval (E*(P, W), E^ r (P, W)) 

R a , r (P, E,W) + E- E* a (P, W) < R a , r (P, E* a (P, W), W) < 

< R a , x (P, E* a (P, W), W) < R a , x (P, E,W) + E- E* a (P, W). 
If E^ r (P, W) < E* a (P, W), then for all E greater than E*(P, W) we have 

R a , x (P,E,W) = = R a , r (P,E,W). 

In this case E*(P, W) = E° X (P, W). 

Lemma 8: For each a-decoding defined in (16), (17), or (18) 

R a , x (P,E,W)<R x (P,E,W), (37) 

moreover, for maximum likelihood decoding given by (16) the equality holds. 
Proof: First we prove the inequality (37). As with (19) we have 

D(V\\W\P) + I p _,(YAX\X) = £ P(x)V(x\x)V(y\x,x)log^^-, (38) 

x,x,y 

and by (20) 

D(V\\W\P) + I p _ ,(Y A X|X) = £ P(x)V(x\x)V(y\x, x) log ^fe^, (39) 

x,x,y 

the left parts of which are equal when V = V, then from (34) 

R a , x (P, E,W)<_ jnin {Ipy{X A X)+ 

V,V,V:V=V 

+|(l/2)(/ p _ = (y A X\X) + D{V\\W\P)) + (l/2)(/ p _ = (y A X\X) + D{V\\W\P)) - E\ + }. 
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From (11), (38) and (39) denoting 

yJW(y\x)W(y\x) 



Vi(y\x,x) = (exp d B (x,x))Jw(y\x)W(y\x) 



E JW(y'\x)W(y'\x) 
y' 



we have 

R a>x (P, E, W) < min(/ py (X A X ) + I^II^P) + E py d B (X, X) - E\ + ) 



vy 



= min(/ pF (X A X) + \E pV d B (X, X) - E\ + ) = R X (P, E, W). 



v 



Let us now prove that in the case of maximum likelihood decoding 

R x (P,E,W)=R a , x (P,E,W). (40) 
From the condition V -< a V and from (16) we have 

D(V\\W\P) + H py {Y\X) < D{V\\W\P) + H Py {Y\X). 
In accordance with (38), (39) and the last inequality we can deduce 

D(V\\W\P) + I p _^(Y AX\X) > DiVWV^P oV) + E py d B (X,X) > E p yd B (X,X). 
Hence 

\I py (X A X) + I p _~(Y AX\X) + D(V\\W\P) - E\ + > 

> \I P y(X A X) + E py d B (X, X)-E\ + , 

which is equivalent to inverse inequality to (37) and therefore (40) holds. 
Lemma 9: For each a-decoding 

R a , r (P,E,W)<R r (P,E,W), (41) 

moreover, for 

-maximum likelihood decoding, 

-minimum entropy decoding, 

-minimum divergence decoding 
the equality holds. 

Proof: The inequality (41) is valid because (see (33) and (10)) 

R a , r (P,E,W) < min \I pv {Y A X) + D(V\\W\P) - E\ + = R r (P, E, W). 

V:V=V,V:D(V\\W\P)<E 

For the case of maximal likelihood decoding (16) 

V -< V <=► { D{ yW W \ P ) + H py( Y \X) < D{V\\W\P) + H P ,v<y\X), 
\ PV = PV, 
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hence 

D(V\\W\P) + I Py {X A F) < D(V\\W\P) + I p y(X A F). (42) 
Denote for brevity 

g = {(V,V) : PV = PV, D(V\\W\P) < E and (42) holds}. 

Thus, we can write instead of (33) 

R a , r {P,E,W) = min \I py (X A Y) + D{V\\W\P) - E\ + = mm{B 1: B 2 }, (43) 
(v,v)eg 



where 

Bt= _ jnin |/ p j;(lA F) + D(V||W|P) — E\ + , 



(V,V)eg,V:D(V\\W\P)<D(V\\W\P) 

B 2 = _ jnin |/ p r;(lA F) + Z?(V||W|P) — P| + . 

(v,y)6e,y:D(y||w|P)>D(y||w|P) 

In turn neither B 1 nor P 2 are not less than R r (P, E, W). Really, if inequalities (42) and 
D(V\\W\P) > D(V\\W\P) hold simultaneously, then I Py (X AY) < I ^(X AY). Hence 

B 2 > _ jnin \I P y(X AY) + D(V\\W\P) — E\ + = P,(P, E, W). 

(V,V)eQ,V:D(V\\W\P)>D(V\\W\P) 

But 

B i > _ jnin |/ p y(lA F) + D(V"||W|P) - E\ + = 

(V,V)£g,V:D(V\\W\P)<D(V\\W\P) 

= _ min |/ p ^(lA F) + P(V"||VF|P) — P| + = P r (P, P, W). 
y : D(y||w|P)<p 

The two latest inequalities along with (41) and (43) provide the statement of the Lemma for 
the method of maximum-likelihood decoding. 

For the case of minimum-divergence decoding (18) 

V ^ a V <=^> {D(V\\W\P) < D(V\\W\P), PV = PV}. 

Then according to (33) and (10) 

R a , r (P,E,W) > _ min \I pv (X A F) + D(V\\W\P) - E\ + = R r (P, E, W). 

V:D(V\\W\P)<E 

For the case of minimum-entropy decoding from (17) 

V <q.V -<=>- {H py {Y\X) < H Py (Y\X),PV = PV}, 
which implies I p y(X AY) > Ipy{X A Y) from where 

R a , r (P, E, W) > min p)<fi \I Py {X AY) + D(V\\W\P) - E\ + = R r (P, P, W). 

Thus the proof of Theorem 1 is completed by unification of results of Lemmas 4-9 and 
Proposition 1. 
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IV. Proof of Theorem 2 



Let E and S be given such that E > 5 > 0. Let the code (/, g) of length N be defined, 
R be the rate of the code and average error probability satisfies the analog of condition (3) 
for E - 5 

e(f,g,N,W)=exp{-N(E-5)}, 
which according to Definition 3 is 

± E W N {y N - g-\m)\f{m)} = exp{-N(E - 5)}. (44) 



rn 



As the number of messages M may be presented by sum of numbers of codewords of different 
types M = J2p\f(M)C\Tp(X)\, and the number of all types P E V N {X) is less than 
(N + 1)1*1 (see (4)), then there exists a "major" type P* such, that 

\f(M)f)T*l(X)\> M(N + 1)-^. (45) 

Now for any conditional type V in the left part of (44) we can consider only codewords of 
type P* and the part of output vectors y of the conditional type V 

£ W N {T» y {Y\f{m))-g-\m)\f{m)} < M exp{—N(E — 5)}. 

m:/(m)6T p ",(X) 

or 

E {\T£, v (Y\f(m))\ - \T P N « y (Y\f(m))f)g^(m)\}W N (y\x) < 

m:/(m)er^(X) 

< Mexp{-N(E-8)}. 

From (9) we obtain 

Y\T N (Y\f(m))\ Mexp{-N(E-5)} 

< E \T» y (Y\f(m))f)g-\m)\. 

m:f(m)eT£(X) 

It follows from the definition of decoding function g that the sets g~ l (m) are disjoint, there- 
fore 

E \T P N , y (Y\f(m))f]g'\m)\ < \T P %(Y)\. 

Then from (7) we have 

\f(M)f]T P N ,(X)\(N + ir^exp{NH P , y (Y\X)}- 

-Mexp{N(D(V\\W\P*) + H P , y {Y\X) - E + 5)} < ex.p{NH P * v (Y)}. 
Taking into account (45) we come to an estimate: 

ex.p{NI P * :V (X AY)} 



M < 



(N + l)-l^l(l^l+ 1 ) - exp(N(D(V\\W\P*) - E + 5))' 
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The right part of this inequality can be minimized by the choice of conditional type V, 
keeping the denominator positive, which takes place for large N when the following inequality 
holds: 

D{V\\W\P*) <E-8. 

The statement of Theorem 2 follows from the definitions of R(E, W) and R sp (E,W) and 
from the continuity by E of the function R sp (P, E, W) . 

Similarly the same bound in the case of maximal error probability can be proved, but it 
follows also from the given proof. 

Example. We shall calculate R sp (E, W) for the binary symmetric channel (BSC). Con- 
sider BSC W with 

x = {o,i}, y = {o',i'}, 

W(0'\1) = W(1'\0) =w 1 >0, W(0'\0) = W(l'\l) = w 2 >0. 
Correspondingly, for another BSC V on the same X and y we denote 

V(0'\1) = V(1'\0) = v u V(0'\0) = V(l'\l) = v 2 . 

It is clear that wi + w 2 — 1, v± + v 2 — 1. 

The maximal value of the mutual information Ipy(X /\Y) in the defination of R sp (E, W) 
comes out on p*(0) — p*(l) = 1/2 because of symmetry of the channel, therefore 

I P *y(X A Y) — 1 + Vi logvt + v 2 \ogv 2 . 

The condition D(V\\W\P*) < E will take the following form 

Vi log h v 2 \og — < E. 

w 1 w 2 

So, the problem of extremum with restrictions must be solved (see (12)): 

— (1 + v\ logfi + v 2 log v 2 ) = max 

< -( Vl \ og ^ + v 2 \og^-E) = 

k Vi +v 2 = 1. 

Using Kuhn-Takker theorem, we find that vi,v 2 give the solution of the problem if and only 
if there exist Ai > 0, A 2 > 0, satisfying the following conditions 

' jL(_i _ Vl i ogVl _ V2 \ ogV2 ) + X^i-v, log - v 2 log ^ + E)+ 

< +^£-{vi + v 2 - 1) = 0, 2 = 1,2, 

. \ 1 (v 1 ]og% + v 2 ]og%-E) = 0, 
which for Vi and v 2 giving maximum are equivalent to 

' \ogVi + \oge = -Ai(logg + loge + A 2 ), i = l,2, 

(46) 

vi log ^ + v 2 log ^ = E. 
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Solving the first two equations from (46) we obtain 

2"T+A7( A i- A 2+ 1 ) 1 °s e i — i 2. 



a i 

Vi = W- 



Let us denote j^^- = s and remember that v i + v 2 = 1, then as functions of parameter 
se(0,l): 



= —T-. J> ^2 



From the third condition in (46) we obtain the parametric expressions for £ and R sp (E, W): 



1 < 1 , W 2 ,„„ ™2* 1 



£(s) = — log + l - — log 

w\ +W2 w{ + W2 wf + W2 w{ + w| ' 

( s ) = 1 + 1 log i + ? log . 

Wf +W2 Wf + W2 Wf + U>2 Wf + U>f 

It is not complicated to see that we arrived to the same relation between R sp and E as 
that given in Theorem 5.8.3 of the Gallager's book [13]. 



V. Comparison of the Bounds for ^-capacity 

Lemma 10: For given DMC W, for type P and numbers < E' < E 
R r (P, E, W) = mm E \R sp (P, E', W) + E' - E\ + . 

Proof: Applying definitions (10) and (12) we see: 

R r (P,E,W) = min I I PV (X A Y) + D(V \\W\P) - E\ + = 

V:D(V\\W\P)<E ' 



mm 

E':E'<E,V:D{V\\W\P)=E> 



\I P ,v(X A Y) + E' - E\ + = 



= min \R SP (P,E',W) + E' - E\ + . 

E':E'<E l PK ' ' 1 1 



Lemma 11: Involving 



we can write for all E > 
R r (P,E,W) = 



R sp (P,E,W), if E< E cr (P,W), 

\R sp (P, E cr (P, W), W) + E cr (P, W) - E\+, HE > E cr (P, W). 



Proof: Since function R sp (P, E, W) is convex by E then for the values of E < E cr (P, W) 
the slope of the tangent is not greater than —1, and for E > E cr (P, W), it is equal or greater 
than —1. In other words 

< —1, when E' < E < EJP, W), 

E — E 
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from where 

R sp (P, E,W) + E< R sp (P, E', W) + £", 

and consequently 

min R SV (P, E', W) + E' = R sp (P, E, W) + E. 

E':E'<E<E cr (P,W) F P 

We obtain from this equality and Lemma 10 the statement of the Lemma for the case 
E < E cr (P, W). Now, if E cr (P, W)<E'< E, then 

R sp (P,E,W)-R sp (P,E',W) > 

E-E' ~ ' 

or 

R sp (P, E,W) + E> R sp (P, E\ W) + E', 

and consequently 

min ^ R sp (P, E', W) + E' = R sp (P, E cr (P, W),W) + E cr (P, W). 

Again, using Lemma 10 and the last equality we obtain that for the case E > E cr (P, W) 
R r (P, E, W) = min{ min \R sp (P,E',W)+E'-E\ + , 

E':E cr (P,W)<E'<E 

min \R SP (P,E',W) + E > -E\ + } = 

E':E'<E cr (P,W) ^ 

= \R sp (P, E cr (P, W),W) + E cr (P, W) - E\ + . 
Thus in the interval (0, E cr (P, W)\ the functions R(P, E, W) are exactly determined by 

R(P, E, W) = R sp (P, E, W) = R r (P, E, W), 

and in the interval (Q : E cr (W)), 

R(E, W) = R sp (E, W) = R r (E, W). 

So the proof of Theorem 3 is completed. 
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